3,834 research outputs found

    When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

    Full text link
    Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold τ\tau. In contrast to previous work where τ\tau is assumed to be quite close to 1, we focus on recommendation applications where τ\tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small τ\tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say τ=0.2\tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges

    A Memory-Efficient Sketch Method for Estimating High Similarities in Streaming Sets

    Full text link
    Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases, machine learning, and information retrieval. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications such as similarity search and large scale learning. Its two compressed versions, b-bit MinHash and Odd Sketch, can significantly reduce the memory usage of the original MinHash method, especially for estimating high similarities (i.e., similarities around 1). Although MinHash can be applied to static sets as well as streaming sets, of which elements are given in a streaming fashion and cardinality is unknown or even infinite, unfortunately, b-bit MinHash and Odd Sketch fail to deal with streaming data. To solve this problem, we design a memory efficient sketch method, MaxLogHash, to accurately estimate Jaccard similarities in streaming sets. Compared to MinHash, our method uses smaller sized registers (each register consists of less than 7 bits) to build a compact sketch for each set. We also provide a simple yet accurate estimator for inferring Jaccard similarity from MaxLogHash sketches. In addition, we derive formulas for bounding the estimation error and determine the smallest necessary memory usage (i.e., the number of registers used for a MaxLogHash sketch) for the desired accuracy. We conduct experiments on a variety of datasets, and experimental results show that our method MaxLogHash is about 5 times more memory efficient than MinHash with the same accuracy and computational cost for estimating high similarities

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Theory of quantum Hall effect and high Landau levels

    Full text link
    The angular momentum model which couples the spin and charge is discussed as a possible theory of the quantum Hall effect. The high Landau level filling fractions 5/2, 7/3 and 8/3 are understood by this model. It is found that 7/3 and 8/3 are the particle-hole conjugates and 5/2 arises due to a limiting level at 1/2 with Landau level number n=5 which makes the fraction as 5/2.Comment: 7 page

    Water Quality Analysis of the Lake and Weather Studies at Keoladeo National Park, Bharatpur

    Get PDF
    Keoladeo National Park, situated between 27⁰ 7.6' to 27⁰ 12.2' N and 77⁰ 29.5 to 77⁰ 33.9' E, is two Southeast of the Bharatpur city, 38 Km Southwest of Mathura and 50 Km West of Agra. Delhi is 180 Km. North of Bharatpur. The total area of the park is about 29 Sq.Km. It is flat with a gentle slope towards the centre forming a depression, the total area of which is about 8.5 Sq.Km. This is a main submersible area of the park. The average elevation of the area is about 174 meters. The submersible area has been divided into various unequal compartments by means of dykes

    Volume measurement using 3D Range Imaging

    Get PDF
    The use of 3D Range Imaging has widespread applications. One of its applications provides us the information about the volumes of different objects. In this paper, 3D range imaging has been utilised to find out the volumes of different objects using two algorithms that are based on a straightforward means to calculate volume. The algorithms implemented succesfully calculate volume on objects provided that the objects have uniform colour. Objects that have multi-coloured and glossy surfaces provided particular difficulties in determining volume

    Implementation of Improved Jigs and Fixtures in the Production of Non-Active Rotary Paddy Weeder

    Get PDF
    Paddy weeder is a manual weeding tool and is used to control weeds in the concerned area. Presently, the manufacturing of paddy weeder employs  traditional methods, i.e. production process was being executed without the implementation of jigs. Jigs are special work holding and tool guiding device. The use of jigs and fixtures is an economical way to produce a component in mass, and hence they serve as an important component for mass production system. The primary objective of this research was to increase the production by reducing operation time thus reducing the cost associated with it. The jigs and fixtures were firstly designed with the software solidworks. The  implementation of improved jigs proved remarkably efficient and helped in saving 35% of the time elapsed in the production cycle, with increased worker safety and hence, proved helpful in facilitating mass production

    A personal journey of studying positive psychology: reflections of undergraduate students in the United Arab Emirates

    Get PDF
    Background: An increasing number of undergraduate positive psychology courses offer students a holistic view of the broader discipline of psychology. Even short-term participation in positive psychology activities as part of a taught course may improve psychological well-being and lower stress. However, there is a dearth of qualitative evidence on how students experience this learning process. Objective: This study aimed to explore UAE-based undergraduate students’ reflections on their experiences of an elective positive psychology course and their participation in various positive psychology interventions (PPIs). Method: This qualitative study explored 21 UAE-based undergraduate students’ reflections on taking a semester-long positive psychology course, in which they participated in PPIs. The rich data from semi-structured interviews were analyzed using reflexive thematic analysis. Results: Three main themes emerged, namely rethinking positive psychology, changes in perspective on happiness and search for positivity, and enhanced relationships. Conclusion and Teaching Implications: The study suggests that positive psychology may reach past the time and space of the taught course and have at least a short-term positive impact on students' mental and social lives. Findings from this study imply the potential of positive psychology in higher education and point towards further integration of such courses in undergraduate programs in the UAE and beyond
    • 

    corecore